Ensemble Clustering based on Heterogeneous Dimensionality Reduction Methods and Context-dependent Similarity Measures
نویسندگان
چکیده
This paper discusses one method of clustering a high dimensional dataset using dimensionality reduction and context dependency measures (CDM). First, the dataset is partitioned into a predefined number of clusters using CDM. Then, context dependency measures are combined with several dimensionality reduction techniques and for each choice the data set is clustered again. The results are combined by the cluster ensemble approach. Finally, the Rand index is used to compute the extent to which the clustering of the original dataset (by CDM alone) is preserved by the cluster ensemble approach.
منابع مشابه
Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering
Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کاملSpike Train SIMilarity Space (SSIMS): A Framework for Single Neuron and Ensemble Data Analysis
Increased emphasis on circuit level activity in the brain makes it necessary to have methods to visualize and evaluate large-scale ensemble activity beyond that revealed by raster-histograms or pairwise correlations. We present a method to evaluate the relative similarity of neural spiking patterns by combining spike train distance metrics with dimensionality reduction. Spike train distance met...
متن کاملToward Multi-Diversified Ensemble Clustering of High-Dimensional Data
The emergence of high-dimensional data in various areas has brought new challenges to the ensemble clustering research. To deal with the curse of dimensionality, considerable efforts in ensemble clustering have been made by incorporating various subspace-based techniques. Besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimila...
متن کاملProximity Graphs for Clustering and Manifold Learning
Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality red...
متن کامل